R is the most popular free software environment for statistical computing and graphics. ggplot2 is a data visualization package for R that can be used to produce publication-quality graphics. This workshop is designed to introduce you to R and ggplot as well as RStudio, KnitR, Slidify, and Shiny.
R is a central piece of the Big Data Analytics Revolution, for example, see http://opensource.com/business/14/7/interview-david-smith-revolution-analytics for an article entitled “Big data influencer on how R is paving the way”
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.4 evaluate_0.5.5 formatR_0.10 htmltools_0.2.4
## [5] knitr_1.6 rmarkdown_0.3.10 stringr_0.6.2 tools_3.1.2
## [9] yaml_2.1.13
You also need to install LaTeX if you want to generate PDF files from KnitR.
Use a GUI tool like SourceTree to clone the repository or execute the following commands in a terminal window:
Phils-MacBook-Pro:Mine pcannata$ pwd
/Users/pcannata
Phils-MacBook-Pro:~ pcannata$ git clone https://github.com/pcannata/DataVisualization.git
Cloning into ‘DataVisualization’… remote: Counting objects: 74, done. remote: Compressing objects: 100% (60/60), done. remote: Total 74 (delta 6), reused 67 (delta 4) Unpacking objects: 100% (74/74), done. Checking connectivity… done.
Phils-MacBook-Pro:~ pcannata$ ls -a DataVisualization/
. .. .git README.md RWorkshop
Create an new file text named .Rprofile.
Put the following into .Rprofile
require(“ggplot2”)
require(“gplots”)
require(“plyr”)
require(“grid”)
require(“RCurl”)
require(“reshape2”)
require(“rstudio”)
require(“tableplot”)
require(tidyr)
require(dplyr)
This is something that is easily done in Excel:
How would you do the same thing in R?
source("../00 Overview/Overview.R", echo = TRUE)
##
## > x <- c(1, 2, 3, 4, 5)
##
## > y <- 3 * x
##
## > y1 <- 2^x
##
## > x
## [1] 1 2 3 4 5
##
## > y
## [1] 3 6 9 12 15
##
## > y1
## [1] 2 4 8 16 32
##
## > df <- data.frame(x, y, y1)
##
## > df
## x y y1
## 1 1 3 2
## 2 2 6 4
## 3 3 9 8
## 4 4 12 16
## 5 5 15 32
##
## > library(reshape2)
##
## > mdf <- melt(df, id.vars = "x", measure.vars = c("y",
## + "y1"))
##
## > mdf
## x variable value
## 1 1 y 3
## 2 2 y 6
## 3 3 y 9
## 4 4 y 12
## 5 5 y 15
## 6 1 y1 2
## 7 2 y1 4
## 8 3 y1 8
## 9 4 y1 16
## 10 5 y1 32
##
## > library(ggplot2)
##
## > ggplot(mdf, aes(x = x, y = value, color = variable)) +
## + geom_line()
See also http://cran.r-project.org/doc/manuals/r-devel/R-lang.html, http://www.r-tutor.com/r-introduction, and http://www.cookbook-r.com/
source("../01 Basic R/Basic.R", echo = TRUE)
##
## > "Variables"
## [1] "Variables"
##
## > v <- 211
##
## > v
## [1] 211
##
## > "Global Variables"
## [1] "Global Variables"
##
## > g <<- 234
##
## > g
## [1] 234
##
## > "Vectors"
## [1] "Vectors"
##
## > v1 <- c(1, 2, 3, 4, 5)
##
## > v1
## [1] 1 2 3 4 5
##
## > v2 <- 1:11
##
## > v2
## [1] 1 2 3 4 5 6 7 8 9 10 11
##
## > v3 <- -5:5
##
## > v3
## [1] -5 -4 -3 -2 -1 0 1 2 3 4 5
##
## > "Vector Operations"
## [1] "Vector Operations"
##
## > v1
## [1] 1 2 3 4 5
##
## > v1 + 2
## [1] 3 4 5 6 7
##
## > v2
## [1] 1 2 3 4 5 6 7 8 9 10 11
##
## > sqrt(v2)
## [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162 3.317
##
## > v2
## [1] 1 2 3 4 5 6 7 8 9 10 11
##
## > v3
## [1] -5 -4 -3 -2 -1 0 1 2 3 4 5
##
## > v2 + v3
## [1] -4 -2 0 2 4 6 8 10 12 14 16
##
## > length(v3)
## [1] 11
##
## > mean(4:22)
## [1] 13
##
## > "Data Types: Numeric, Character, Dates, Logical(TRUE, FALSE)"
## [1] "Data Types: Numeric, Character, Dates, Logical(TRUE, FALSE)"
##
## > "Missing Data: NA"
## [1] "Missing Data: NA"
##
## > v <- c(1, 2, NA, 3)
##
## > v
## [1] 1 2 NA 3
##
## > "Missing Data: NULL"
## [1] "Missing Data: NULL"
##
## > v <- c(1, 2, NULL, 3)
##
## > v
## [1] 1 2 3
##
## > "Functions"
## [1] "Functions"
##
## > "Functions will be introduced in the section pn ggplot below, however, let's have a look at the apropos() function:"
## [1] "Functions will be introduced in the section pn ggplot below, however, let's have a look at the apropos() function:"
##
## > apropos("mean")
## [1] ".colMeans" ".rowMeans" "colMeans" "kmeans"
## [5] "mean" "mean_cl_boot" "mean_cl_normal" "mean_sdl"
## [9] "mean_se" "mean.Date" "mean.default" "mean.difftime"
## [13] "mean.POSIXct" "mean.POSIXlt" "rowMeans" "weighted.mean"
##
## > "Data Structures: Dataframes, Lists, Matricies, and Arrays. Only Dataframes will be addressed in this workshop."
## [1] "Data Structures: Dataframes, Lists, Matricies, and Arrays. Only Dataframes will be addressed in this workshop."
A data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b) # df is a data frame
head(df)
## n s b
## 1 2 aa TRUE
## 2 3 bb FALSE
## 3 5 cc TRUE
Dataframes can be loaded from databases, CSVs, Excel, etc.. Loading dataframes from an Oracle database will be discussed later in this Workshop.
See also http://www.r-tutor.com/r-introduction/data-frame
Many R packages come with demo dataframes. The ggplot package comes with a demo dataframe called diamonds which we will use for this workshop.
source("../02 R Dataframes/Dataframes.R", echo = TRUE)
##
## > library("ggplot2")
##
## > "Displaying the top few rows of a dataframe:"
## [1] "Displaying the top few rows of a dataframe:"
##
## > head(diamonds)
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
##
## > "Summary of each variable in the dataframe."
## [1] "Summary of each variable in the dataframe."
##
## > names(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
##
## > `?`(diamonds)
##
## > summary(diamonds)
## carat cut color clarity
## Min. :0.200 Fair : 1610 D: 6775 SI1 :13065
## 1st Qu.:0.400 Good : 4906 E: 9797 VS2 :12258
## Median :0.700 Very Good:12082 F: 9542 SI2 : 9194
## Mean :0.798 Premium :13791 G:11292 VS1 : 8171
## 3rd Qu.:1.040 Ideal :21551 H: 8304 VVS2 : 5066
## Max. :5.010 I: 5422 VVS1 : 3655
## J: 2808 (Other): 2531
## depth table price x
## Min. :43.0 Min. :43.0 Min. : 326 Min. : 0.00
## 1st Qu.:61.0 1st Qu.:56.0 1st Qu.: 950 1st Qu.: 4.71
## Median :61.8 Median :57.0 Median : 2401 Median : 5.70
## Mean :61.8 Mean :57.5 Mean : 3933 Mean : 5.73
## 3rd Qu.:62.5 3rd Qu.:59.0 3rd Qu.: 5324 3rd Qu.: 6.54
## Max. :79.0 Max. :95.0 Max. :18823 Max. :10.74
##
## y z
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 4.72 1st Qu.: 2.91
## Median : 5.71 Median : 3.53
## Mean : 5.73 Mean : 3.54
## 3rd Qu.: 6.54 3rd Qu.: 4.04
## Max. :58.90 Max. :31.80
##
##
## > "Selecting a subset of columns from a dataframe:"
## [1] "Selecting a subset of columns from a dataframe:"
##
## > head(subset(diamonds, select = c(carat, cut)))
## carat cut
## 1 0.23 Ideal
## 2 0.21 Premium
## 3 0.23 Good
## 4 0.29 Premium
## 5 0.31 Good
## 6 0.24 Very Good
##
## > "Selecting a subset of rows from a dataframe:"
## [1] "Selecting a subset of rows from a dataframe:"
##
## > head(subset(diamonds, cut == "Ideal" & price > 5000))
## carat cut color clarity depth table price x y z
## 11417 1.16 Ideal E SI2 62.7 56.0 5001 6.69 6.73 4.21
## 11418 1.16 Ideal E SI2 59.9 57.0 5001 6.80 6.82 4.08
## 11422 1.07 Ideal I SI1 61.7 56.1 5002 6.57 6.59 4.06
## 11423 1.10 Ideal H SI2 62.0 56.5 5002 6.58 6.63 4.09
## 11424 1.20 Ideal J SI1 62.1 55.0 5002 6.81 6.84 4.24
## 11431 1.14 Ideal H SI1 61.6 57.0 5003 6.70 6.75 4.14
##
## > "Find average price group by color (plyr package is needed)"
## [1] "Find average price group by color (plyr package is needed)"
##
## > library("plyr")
##
## > ddply(subset(diamonds, cut == "Ideal" & price > 5000),
## + ~color, summarise, o = mean(price, na.rm = TRUE))
## color o
## 1 D 9057
## 2 E 9065
## 3 F 9704
## 4 G 9392
## 5 H 8923
## 6 I 9663
## 7 J 9407
For more on subsetting dataframes see http://www.ats.ucla.edu/stat/r/faq/subset_R.htm
source("../02 RestfulReL/Access Oracle Database.R", echo = TRUE)
##
## > library("RCurl")
## Loading required package: bitops
##
## > df <- data.frame(eval(parse(text = substring(getURL(URLencode("http://129.152.144.84:5001/rest/native/?query=\"select * from emp\""),
## + httphea .... [TRUNCATED]
##
## > head(df)
## EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
## 1 7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 null 20
## 2 7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600 300 30
## 3 7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250 500 30
## 4 7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975 null 20
## 5 7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30
## 6 7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850 null 30
ggplot is an R package for data exploration and visualization. It produces production quality graphics and allows you to slice and dice your data in many different ways. ggplot uses a general scheme for data visualization which breaks graphs up into semantic components such as scales and layers. In contrast to other graphics packages, ggplot2 allows the user to add, remove or alter components in a plot at a high level of abstraction.
See also http://ggplot2.org/, http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf, and https://groups.google.com/forum/#!forum/ggplot2
source("../03 ggplot/Plots.R", echo = TRUE)
##
## > options(java.parameters = "-Xmx2g")
##
## > head(diamonds)
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
##
## > ggplot(data = diamonds) + geom_histogram(aes(x = carat))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > ggplot(data = diamonds) + geom_density(aes(x = carat,
## + fill = "gray50"))
##
## > ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
##
## > p <- ggplot(diamonds, aes(x = carat, y = price)) +
## + geom_point(aes(color = color))
##
## > p + facet_wrap(~color)
##
## > p + facet_grid(cut ~ clarity)
##
## > p <- ggplot(diamonds, aes(x = carat)) + geom_histogram(aes(color = color),
## + binwidth = max(diamonds$carat)/30)
##
## > p + facet_wrap(~color)
##
## > p + facet_grid(cut ~ clarity)
The Chapter 7 of “R for Everyone” has many more examples of ggplots.
source("../03 ggplot/plotFunction.R", echo = TRUE)
##
## > FigureNum <<- 0
##
## > ggplot_func <- function(df, Title = "Diamonds", Legend = "color",
## + PointColor = c("red", "blue", "green", "yellow", "grey",
## + "black" .... [TRUNCATED]
##
## > p1 <- ggplot_func(diamonds)
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
##
## > p1
##
## > p2 <- ggplot_func(diamonds, YMin = 5000, YMax = 15000)
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
##
## > p2
## Warning: Removed 40868 rows containing missing values (geom_point).
##
## > p3 <- ggplot_func(subset(diamonds, cut == "Premium"),
## + Legend = "cut")
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
##
## > p3
##
## > p4 <- ggplot_func(diamonds, Legend = "clarity", PointColor = c("red",
## + "blue", "green", "yellow", "grey", "black", "purple", "orange"))
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
##
## > p4
##
## > library("grid")
##
## > png("4diamonds.png", width = 25, height = 20, units = "in",
## + res = 72)
##
## > grid.newpage()
##
## > pushViewport(viewport(layout = grid.layout(2, 2)))
##
## > print(p1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
##
## > print(p2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
## Warning: Removed 40868 rows containing missing values (geom_point).
##
## > print(p3, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
##
## > print(p4, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))
##
## > dev.off()
## pdf
## 2
You should now be able to open RWorkshop/00 Doc/4diamonds.png. It should look like the following plot.
KnitR is an R package designed to generate dynamic reports using a mix of the R, LaTex, and the Rmarkdown (see http://rmarkdown.rstudio.com/?version=0.98.945&mode=desktop) languages.
See also http://yihui.name/knitr/ and http://kbroman.github.io/knitr_knutshell/
Simple examples can be found in “04 KnitR/doc1.Rmd” and “04 KnitR/doc2.Rmd”. These can generate html, pdf, and word documents. The output from Kniting doc2.Rmd is,
You can use Slidify to generate HTML slide decks using only the Rmarkdown language.
See also http://slidify.org and http://slidify.org/start.html
Follow the instructions in “05 Slidify/slidify setup.R” to install and run slidify. You should be able to produce a slide deck with a first slide that looks something like the following.
Cool trick - Any github repo with a branch called gh-pages will get served as a website. If the content of that repo is the stuff of websites (html,css), then you get free web hosting. So, create a branch called gh-pages and push to it.
The shiny R package allows you to build interactive web-based applications using only R with no knowledge of html, css, or javascript needed. You just need to write two scripts (see the example files in the 06Shiny directory):
See also http://shiny.rstudio.com and http://shiny.rstudio.com/tutorial
To run the shiny app that’s in the 06Shiny directory run the following in the main RWorkshop directory (make sure the working directory is set to this directory):
library(shiny)
runApp(“06Shiny”) # Make sure there are no spaces in the string argument to runAPP
This should pop the application up in a browser, you can also access it in a browser at http://127.0.0.1:6837. It should look like the following.
The example above ran the shiny app on your local machine, but to share with others, you have to send around the R files and the user needs to have R and know a little bit about it.
Instead, you can remotely host shiny apps and then just send people links. Get a free account at shinyapps.io/signup.html and give it a try.
library(“devtools”, lib.loc=“/Library/Frameworks/R.framework/Versions/3.0/Resources/library”)
install_github( repo = “shinyapps”, username=“rstudio” )
shinyapps::setAccountInfo(name=‘pcannata’, token=‘3ECF447A741004F6A8B7208C9ED778E1’, secret=‘. . .’)
# library(shinyapps)
getwd()
## [1] "/Users/pcannata/Mine/UT/GitRepositories/DataVisualization/RWorkshop/00 Doc"
# Uncomment the following line to deploy the app.
#deployApp("../06Shiny")
Now you can try the app at https://pcannata.shinyapps.io/06Shiny/
See also https://www.shinyapps.io/ and http://shiny.rstudio.com/articles/shinyapps.html
See also http://cran.r-project.org/doc/manuals/r-devel/R-lang.html, http://www.r-tutor.com/r-introduction, and http://www.cookbook-r.com/
source("../07 Data Wrangling/Data Wrangling.R", echo = TRUE)
##
## > require(tidyr)
## Loading required package: tidyr
##
## > require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## > tbl_df(diamonds)
## Source: local data frame [53,940 x 10]
##
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
## .. ... ... ... ... ... ... ... ... ... ...
##
## > View(diamonds)
##
## > select(diamonds, cut, clarity) %>% tbl_df
## Source: local data frame [53,940 x 2]
##
## cut clarity
## 1 Ideal SI2
## 2 Premium SI1
## 3 Good VS1
## 4 Premium VS2
## 5 Good SI2
## 6 Very Good VVS2
## 7 Very Good VVS1
## 8 Very Good SI1
## 9 Fair VS2
## 10 Very Good VS1
## .. ... ...
##
## > diamonds %>% select(cut, clarity) %>% tbl_df
## Source: local data frame [53,940 x 2]
##
## cut clarity
## 1 Ideal SI2
## 2 Premium SI1
## 3 Good VS1
## 4 Premium VS2
## 5 Good SI2
## 6 Very Good VVS2
## 7 Very Good VVS1
## 8 Very Good SI1
## 9 Fair VS2
## 10 Very Good VS1
## .. ... ...
##
## > x <- diamonds %>% select(cut, clarity) %>% tbl_df
##
## > diamonds %>% select(cut, clarity) %>% filter(cut ==
## + "Good") %>% tbl_df
## Source: local data frame [4,906 x 2]
##
## cut clarity
## 1 Good VS1
## 2 Good SI2
## 3 Good SI1
## 4 Good SI1
## 5 Good SI1
## 6 Good SI2
## 7 Good VS1
## 8 Good VS1
## 9 Good SI1
## 10 Good VS2
## .. ... ...
##
## > diamonds %>% select(cut, clarity) %>% filter(cut %in%
## + c("Good", "Fair")) %>% tbl_df
## Source: local data frame [6,516 x 2]
##
## cut clarity
## 1 Good VS1
## 2 Good SI2
## 3 Fair VS2
## 4 Good SI1
## 5 Good SI1
## 6 Good SI1
## 7 Good SI2
## 8 Good VS1
## 9 Good VS1
## 10 Good SI1
## .. ... ...
##
## > diamonds %>% select(cut, clarity) %>% filter(cut %in%
## + c("Good", "Fair"), clarity == "VS1") %>% tbl_df
## Source: local data frame [818 x 2]
##
## cut clarity
## 1 Good VS1
## 2 Good VS1
## 3 Good VS1
## 4 Good VS1
## 5 Good VS1
## 6 Good VS1
## 7 Good VS1
## 8 Good VS1
## 9 Fair VS1
## 10 Good VS1
## .. ... ...
##
## > diamonds %>% select(cut, clarity) %>% filter(cut %in%
## + c("Good", "Fair"), clarity == "VS1" | is.na(cut)) %>% tbl_df
## Source: local data frame [818 x 2]
##
## cut clarity
## 1 Good VS1
## 2 Good VS1
## 3 Good VS1
## 4 Good VS1
## 5 Good VS1
## 6 Good VS1
## 7 Good VS1
## 8 Good VS1
## 9 Fair VS1
## 10 Good VS1
## .. ... ...
##
## > diamonds %>% select(cut, clarity, x, y, z) %>% filter(cut %in%
## + c("Good", "Fair"), clarity == "VS1" | is.na(cut)) %>% tbl_df
## Source: local data frame [818 x 5]
##
## cut clarity x y z
## 1 Good VS1 4.05 4.07 2.31
## 2 Good VS1 4.06 4.08 2.37
## 3 Good VS1 3.83 3.85 2.46
## 4 Good VS1 4.19 4.24 2.46
## 5 Good VS1 5.71 5.76 3.40
## 6 Good VS1 5.81 5.77 3.31
## 7 Good VS1 5.97 5.92 3.53
## 8 Good VS1 5.74 5.72 3.48
## 9 Fair VS1 5.89 5.80 3.46
## 10 Good VS1 5.56 5.59 3.63
## .. ... ... ... ... ...
##
## > diamonds %>% select(cut, clarity, x, y, z) %>% filter(cut %in%
## + c("Good", "Fair"), clarity == "VS1" | is.na(cut)) %>% mutate(sum = x +
## + .... [TRUNCATED]
## Source: local data frame [818 x 6]
##
## cut clarity x y z sum
## 1 Good VS1 4.05 4.07 2.31 10.43
## 2 Good VS1 4.06 4.08 2.37 10.51
## 3 Good VS1 3.83 3.85 2.46 10.14
## 4 Good VS1 4.19 4.24 2.46 10.89
## 5 Good VS1 5.71 5.76 3.40 14.87
## 6 Good VS1 5.81 5.77 3.31 14.89
## 7 Good VS1 5.97 5.92 3.53 15.42
## 8 Good VS1 5.74 5.72 3.48 14.94
## 9 Fair VS1 5.89 5.80 3.46 15.15
## 10 Good VS1 5.56 5.59 3.63 14.78
## .. ... ... ... ... ... ...
##
## > ndf <- diamonds %>% select(cut, clarity, x, y, z) %>%
## + filter(cut %in% c("Good", "Fair"), clarity == "VS1" | is.na(cut)) %>%
## + mutate(sum .... [TRUNCATED]
##
## > ndf
## Source: local data frame [818 x 6]
##
## cut clarity x y z sum
## 1 Good VS1 4.05 4.07 2.31 10.43
## 2 Good VS1 4.06 4.08 2.37 10.51
## 3 Good VS1 3.83 3.85 2.46 10.14
## 4 Good VS1 4.19 4.24 2.46 10.89
## 5 Good VS1 5.71 5.76 3.40 14.87
## 6 Good VS1 5.81 5.77 3.31 14.89
## 7 Good VS1 5.97 5.92 3.53 15.42
## 8 Good VS1 5.74 5.72 3.48 14.94
## 9 Fair VS1 5.89 5.80 3.46 15.15
## 10 Good VS1 5.56 5.59 3.63 14.78
## .. ... ... ... ... ... ...
##
## > pmin(c(1:5), (5:1))
## [1] 1 2 3 2 1
##
## > pmax(c(1:5), (5:1))
## [1] 5 4 3 4 5
##
## > c(1, 1, 2, 0, 4, 3, 5) %>% cummin()
## [1] 1 1 1 0 0 0 0
##
## > c(1, 1, 2, 5, 4, 3, 5) %>% cummax()
## [1] 1 1 2 5 5 5 5
##
## > c(1, 1, 2, 3, 4, 3, 5) %>% cumsum()
## [1] 1 2 4 7 11 14 19
##
## > c(1, 1, 2, 3, 4, 3, 5) %>% cumprod()
## [1] 1 1 2 6 24 72 360
##
## > c(1, 1, 2, 3, 4, 3, 5) %>% between(2, 4)
## [1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE
##
## > c(1, 1, 2, 5, 4, 3, 5) %>% cume_dist()
## [1] 0.2857 0.2857 0.4286 1.0000 0.7143 0.5714 1.0000
##
## > c(1:5) %>% cume_dist()
## [1] 0.2 0.4 0.6 0.8 1.0
##
## > c(1, 1:5) %>% cume_dist()
## [1] 0.3333 0.3333 0.5000 0.6667 0.8333 1.0000
##
## > c(1:5) %>% cummean()
## [1] 1.0 1.5 2.0 2.5 3.0
##
## > c(1:5) %>% lead() - c(1:5)
## [1] 1 1 1 1 NA
##
## > c(1:5) %>% lag() - c(1:5)
## [1] NA -1 -1 -1 -1
##
## > c(1:10)
## [1] 1 2 3 4 5 6 7 8 9 10
##
## > c(1:10) %>% ntile(4)
## [1] 1 1 1 2 2 3 3 3 4 4
##
## > diamonds %>% mutate(price_percent = cume_dist(price)) %>%
## + filter(price_percent <= 0.2) %>% arrange(desc(price_percent)) %>%
## + tbl_df
## Source: local data frame [10,783 x 11]
##
## carat cut color clarity depth table price x y z
## 1 0.38 Ideal I VVS2 62.0 55 836 4.65 4.67 2.89
## 2 0.41 Good G SI1 64.3 54 836 4.72 4.68 3.02
## 3 0.41 Ideal G SI2 61.0 57 836 4.82 4.79 2.93
## 4 0.41 Premium H SI2 62.2 56 836 4.80 4.72 2.96
## 5 0.30 Ideal D VS1 62.2 56 835 4.31 4.27 2.67
## 6 0.30 Ideal D VS1 61.0 57 835 4.32 4.31 2.63
## 7 0.35 Very Good D VVS2 60.0 58 835 4.57 4.59 2.75
## 8 0.41 Very Good G VS2 62.7 60 835 4.71 4.75 2.96
## 9 0.38 Very Good F VS1 60.4 60 835 4.70 4.74 2.85
## 10 0.36 Ideal E VS1 61.4 54 835 4.59 4.63 2.83
## .. ... ... ... ... ... ... ... ... ... ...
## Variables not shown: price_percent (dbl)
##
## > bottom20_diamonds <- diamonds %>% mutate(price_percent = cume_dist(price)) %>%
## + filter(price_percent <= 0.2) %>% arrange(desc(price_percent)) .... [TRUNCATED]
##
## > diamonds %>% mutate(price_percent = cume_dist(price)) %>%
## + filter(price_percent >= 0.8) %>% arrange(desc(price_percent)) %>%
## + tbl_df
## Source: local data frame [10,791 x 11]
##
## carat cut color clarity depth table price x y z
## 1 2.29 Premium I VS2 60.8 60 18823 8.50 8.47 5.16
## 2 2.00 Very Good G SI1 63.5 56 18818 7.90 7.97 5.04
## 3 1.51 Ideal G IF 61.7 55 18806 7.37 7.41 4.56
## 4 2.07 Ideal G SI2 62.5 55 18804 8.20 8.13 5.11
## 5 2.00 Very Good H SI1 62.8 57 18803 7.95 8.00 5.01
## 6 2.29 Premium I SI1 61.8 59 18797 8.52 8.45 5.24
## 7 2.04 Premium H SI1 58.1 60 18795 8.37 8.28 4.84
## 8 2.00 Premium I VS1 60.8 59 18795 8.13 8.02 4.91
## 9 1.71 Premium F VS2 62.3 59 18791 7.57 7.53 4.70
## 10 2.15 Ideal G SI2 62.6 54 18791 8.29 8.35 5.21
## .. ... ... ... ... ... ... ... ... ... ...
## Variables not shown: price_percent (dbl)
##
## > top20_diamonds <- diamonds %>% mutate(price_percent = cume_dist(price)) %>%
## + filter(price_percent >= 0.8) %>% arrange(desc(price_percent)) %>% .... [TRUNCATED]
##
## > diamonds %>% mutate(price_percent = cume_dist(price)) %>%
## + filter(price_percent <= 0.2 | price_percent >= 0.8) %>% ggplot(aes(x = price,
## + .... [TRUNCATED]
##
## > diamonds %>% mutate(minxy = pmin(x, y)) %>% tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z minxy
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 3.95
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3.84
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4.05
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 4.20
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 4.34
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 3.94
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 3.95
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 4.07
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 3.78
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 4.00
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(cummin_x = cummin(x)) %>% tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z cummin_x
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 3.95
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3.89
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 3.89
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 3.89
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 3.89
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 3.89
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 3.89
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 3.89
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 3.87
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 3.87
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(cumsum_x = cumsum(x)) %>% tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z cumsum_x
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 3.95
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 7.84
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 11.89
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 16.09
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 20.43
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 24.37
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 28.32
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 32.39
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 36.26
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 40.26
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(between_x = between(x, 4, 4.1)) %>%
## + tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
## .. ... ... ... ... ... ... ... ... ... ...
## Variables not shown: between_x (lgl)
##
## > diamonds %>% mutate(lead_z = lead(z) - z) %>% tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z lead_z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 -0.12
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 0.00
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 0.32
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 0.12
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 -0.27
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 -0.01
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 0.06
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 -0.04
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 -0.10
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 0.34
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(lag_z = lag(z) - z) %>% tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z lag_z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 NA
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 0.12
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 0.00
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 -0.32
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 -0.12
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 0.27
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 0.01
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 -0.06
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 0.04
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 0.10
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(ntile_z = ntile(z, 100)) %>% arrange(desc(ntile_z)) %>%
## + tbl_df
## Source: local data frame [53,940 x 11]
##
## carat cut color clarity depth table price x y z ntile_z
## 1 2.14 Fair J I1 69.4 57 5405 7.74 7.70 5.36 100
## 2 2.15 Fair J I1 65.5 57 5430 8.01 7.95 5.23 100
## 3 2.22 Fair J I1 66.7 56 5607 8.04 8.02 5.36 100
## 4 2.01 Fair I I1 67.4 58 5696 7.71 7.64 5.17 100
## 5 2.27 Fair J I1 67.6 55 5733 8.05 8.00 5.43 100
## 6 2.00 Fair H I1 69.8 54 5914 7.60 7.56 5.29 100
## 7 2.03 Fair H I1 66.6 57 6002 7.81 7.75 5.19 100
## 8 2.49 Fair J I1 66.3 58 6289 8.26 8.18 5.45 100
## 9 2.01 Fair G I1 70.2 57 6315 7.53 7.50 5.27 100
## 10 2.14 Fair H I1 66.4 56 6328 8.00 7.92 5.29 100
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > diamonds %>% mutate(ntile_z = ntile(z, 100)) %>% group_by(ntile_z) %>%
## + summarise(n = n()) %>% tbl_df
## Source: local data frame [100 x 2]
##
## ntile_z n
## 1 1 540
## 2 2 539
## 3 3 540
## 4 4 539
## 5 5 539
## 6 6 540
## 7 7 539
## 8 8 540
## 9 9 539
## 10 10 539
## .. ... ...
##
## > diamonds %>% summarise(mean = mean(x), sum = sum(x,
## + y, z), n = n())
## mean sum n
## 1 5.731 809338 53940
##
## > diamonds %>% group_by(cut, color) %>% summarise(mean = mean(x),
## + sum = sum(x, y, z), n = n())
## Source: local data frame [35 x 5]
## Groups: cut
##
## cut color mean sum n
## 1 Fair D 6.018 2579 163
## 2 Fair E 5.909 3470 224
## 3 Fair F 5.991 4901 312
## 4 Fair G 6.174 5103 314
## 5 Fair H 6.579 5241 303
## 6 Fair I 6.564 3019 175
## 7 Fair J 6.747 2111 119
## 8 Good D 5.620 9770 662
## 9 Good E 5.618 13758 933
## 10 Good F 5.693 13587 909
## .. ... ... ... ... ...
##
## > diamonds %>% group_by(cut, color) %>% summarise(mean = mean(x),
## + sum = sum(x, y, z), n = n()) %>% ungroup %>% summarize(sum(n))
## Source: local data frame [1 x 1]
##
## sum(n)
## 1 53940
##
## > data.frame(x = c(1, 1, 1, 2, 2), y = c(5:1), z = (1:5)) %>%
## + arrange(desc(x)) %>% tbl_df
## Source: local data frame [5 x 3]
##
## x y z
## 1 2 2 4
## 2 2 1 5
## 3 1 5 1
## 4 1 4 2
## 5 1 3 3
##
## > data.frame(x = c(1, 1, 1, 2, 2), y = c(5:1), z = (1:5)) %>%
## + arrange(desc(x), y) %>% tbl_df
## Source: local data frame [5 x 3]
##
## x y z
## 1 2 1 5
## 2 2 2 4
## 3 1 3 3
## 4 1 4 2
## 5 1 5 1
##
## > diamonds %>% group_by(cut, color) %>% summarise(mean = mean(x),
## + sum = sum(x, y, z), n = n()) %>% arrange(n)
## Source: local data frame [35 x 5]
## Groups: cut
##
## cut color mean sum n
## 1 Fair J 6.747 2111 119
## 2 Fair D 6.018 2579 163
## 3 Fair I 6.564 3019 175
## 4 Fair E 5.909 3470 224
## 5 Fair H 6.579 5241 303
## 6 Fair F 5.991 4901 312
## 7 Fair G 6.174 5103 314
## 8 Good J 6.377 5139 307
## 9 Good I 6.254 8569 522
## 10 Good D 5.620 9770 662
## .. ... ... ... ... ...
##
## > diamonds %>% group_by(cut, color) %>% summarise(mean = mean(x),
## + sum = sum(x, y, z), n = n()) %>% arrange(desc(n), cut, color)
## Source: local data frame [35 x 5]
## Groups: cut
##
## cut color mean sum n
## 1 Fair G 6.174 5103 314
## 2 Fair F 5.991 4901 312
## 3 Fair H 6.579 5241 303
## 4 Fair E 5.909 3470 224
## 5 Fair I 6.564 3019 175
## 6 Fair D 6.018 2579 163
## 7 Fair J 6.747 2111 119
## 8 Good E 5.618 13758 933
## 9 Good F 5.693 13587 909
## 10 Good G 5.850 13379 871
## .. ... ... ... ... ...
##
## > diamonds %>% group_by(cut, color, clarity) %>% summarise(mean_carat = mean(carat)) %>%
## + ggplot(aes(x = cut, y = mean_carat, color = color)) + .... [TRUNCATED]
source("../08 eval(parse vs. json/ParseEval vs JSON.R", echo = TRUE)
source("../09 Joining Data/Joining Data.R", echo = TRUE)
##
## > require("jsonlite")
## Loading required package: jsonlite
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
##
## > require(dplyr)
##
## > ddf <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from DIAMONDS\""),
## + httpheader = c(DB = "jdbc:o ..." ... [TRUNCATED]
##
## > tbl_df(ddf)
## Source: local data frame [53,940 x 11]
##
## diamond_id carat cut color clarity depth tbl price x y z
## 1 1 0.23 Ideal E SI2 61.5 55 null 3.95 3.98 2.43
## 2 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 10 0.23 null H VS1 59.4 61 338 4.00 4.05 2.39
## .. ... ... ... ... ... ... ... ... ... ... ...
##
## > sdf <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from diam_sale\""),
## + httpheader = c(DB = "jdbc: ..." ... [TRUNCATED]
##
## > tbl_df(sdf)
## Source: local data frame [64,839 x 4]
##
## SALE_ID RETAILER_ID DIAMOND_ID SALES_DATE
## 1 918 16 769 2013-05-10 00:00:00
## 2 919 41 770 2010-12-15 00:00:00
## 3 920 40 771 2010-02-22 00:00:00
## 4 921 33 772 2009-11-05 00:00:00
## 5 922 33 772 2009-11-05 00:00:00
## 6 923 46 773 2011-11-07 00:00:00
## 7 924 9 774 2011-01-14 00:00:00
## 8 925 16 775 2011-12-23 00:00:00
## 9 926 16 775 2011-12-23 00:00:00
## 10 927 26 776 2011-12-25 00:00:00
## .. ... ... ... ...
##
## > rdf <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from diam_retailer\""),
## + httpheader = c(DB = "j ..." ... [TRUNCATED]
##
## > tbl_df(rdf)
## Source: local data frame [50 x 2]
##
## RETAILER_ID NAME
## 1 0 ZALE CORP
## 2 1 STERLING JEWELERS INC
## 3 2 FRED MEYER JEWELERS
## 4 3 HELZBERG DIAMONDS
## 5 4 ULTRA STORES INC
## 6 5 SAMUELS JEWELERS
## 7 6 TIFFANY CO
## 8 7 ROGERS ENTERPRISES
## 9 8 BEN BRIDGE JEWELER
## 10 9 DON ROBERTO
## .. ... ...
##
## > names(ddf)
## [1] "diamond_id" "carat" "cut" "color" "clarity"
## [6] "depth" "tbl" "price" "x" "y"
## [11] "z"
##
## > names(sdf)
## [1] "SALE_ID" "RETAILER_ID" "DIAMOND_ID" "SALES_DATE"
##
## > names(rdf)
## [1] "RETAILER_ID" "NAME"
##
## > colnames(ddf) <- toupper(names(ddf))
##
## > dsdf <- inner_join(ddf, sdf, by = "DIAMOND_ID")
##
## > inner_join(dsdf, rdf, by = "RETAILER_ID") %>% tbl_df
## Source: local data frame [64,839 x 15]
##
## DIAMOND_ID CARAT CUT COLOR CLARITY DEPTH TBL PRICE X Y Z
## 1 1 0.23 Ideal E SI2 61.5 55 null 3.95 3.98 2.43
## 2 1 0.23 Ideal E SI2 61.5 55 null 3.95 3.98 2.43
## 3 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 4 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 5 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 6 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 7 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 8 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 9 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 10 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## .. ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: SALE_ID (int), RETAILER_ID (int), SALES_DATE (fctr),
## NAME (fctr)
##
## > colnames(ddf) <- toupper(names(ddf))
##
## > inner_join(ddf, sdf, by = "DIAMOND_ID") %>% inner_join(.,
## + rdf, by = "RETAILER_ID") %>% ggplot(aes(x = CARAT, y = NAME,
## + color = CUT)) + .... [TRUNCATED]
##
## > joindf <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from DIAMONDS d join diam_sale s on (d.\\\"diamond ..." ... [TRUNCATED]
##
## > joindf %>% ggplot(aes(x = carat, y = NAME, color = cut)) +
## + geom_point()
source("../10 ListsForIfFunctionsPng/List Indexing.R", echo = TRUE)
##
## > l <- list(a = 1:10, b = 11:20)
##
## > l
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $b
## [1] 11 12 13 14 15 16 17 18 19 20
##
##
## > l[1]
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
##
##
## > l[2]
## $b
## [1] 11 12 13 14 15 16 17 18 19 20
##
##
## > l[[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## > l[[2]]
## [1] 11 12 13 14 15 16 17 18 19 20
##
## > l$a
## [1] 1 2 3 4 5 6 7 8 9 10
##
## > l$b
## [1] 11 12 13 14 15 16 17 18 19 20
##
## > ll <- list(1:10, 11:20)
##
## > ll
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] 11 12 13 14 15 16 17 18 19 20
##
##
## > ll[1]
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
##
## > ll[2]
## [[1]]
## [1] 11 12 13 14 15 16 17 18 19 20
##
##
## > ll[[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## > ll[[2]]
## [1] 11 12 13 14 15 16 17 18 19 20
##
## > lll <- list()
##
## > lll
## list()
##
## > lll[["a"]] <- 111
##
## > lll
## $a
## [1] 111
##
##
## > lll[1]
## $a
## [1] 111
##
##
## > lll[["a"]]
## [1] 111
##
## > lll[[1]]
## [1] 111
For more details on [[…]], see http://stackoverflow.com/questions/1169456/in-r-what-is-the-difference-between-the-and-notations-for-accessing-the
source("../10 ListsForIfFunctionsPng/ListsForIfFunctionsPng.R", echo = TRUE)
##
## > require("tidyr")
##
## > require("dplyr")
##
## > require("jsonlite")
##
## > q = "Good"
##
## > r <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from diamonds where \\\"cut\\\" = \\'\"q\"\\'\""),
## + .... [TRUNCATED]
##
## > myplot <- function(df, x) {
## + names(df) <- c("x", "n")
## + ggplot(df, aes(x = x, y = n)) + geom_point()
## + }
##
## > categoricals <- eval(parse(text = substring(getURL(URLencode("http://129.152.144.84:5001/rest/native/?query=\"select * from diamonds\""),
## + htt .... [TRUNCATED]
##
## > ddf <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/?query=\"select * from DIAMONDS\""),
## + httpheader = c(DB = "jdbc:o ..." ... [TRUNCATED]
##
## > l <- list()
##
## > for (i in names(ddf)) {
## + if (i %in% categoricals[[1]]) {
## + r <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/? ..." ... [TRUNCATED]
##
## > png("/Users/pcannata/Mine/UT/GitRepositories/DataVisualization/RWorkshop/00 Doc/categoricals.png",
## + width = 25, height = 10, units = "in", res .... [TRUNCATED]
##
## > grid.newpage()
##
## > pushViewport(viewport(layout = grid.layout(1, 12)))
##
## > print(l[[1]], vp = viewport(layout.pos.row = 1, layout.pos.col = 1:4))
##
## > print(l[[2]], vp = viewport(layout.pos.row = 1, layout.pos.col = 5:8))
##
## > print(l[[3]], vp = viewport(layout.pos.row = 1, layout.pos.col = 9:12))
##
## > dev.off()
## pdf
## 2
##
## > myplot1 <- function(df, x) {
## + names(df) <- c("x")
## + ggplot(df, aes(x = x)) + geom_histogram()
## + }
##
## > l <- list()
##
## > for (i in names(ddf)) {
## + if (i %in% categoricals[[2]]) {
## + r <- data.frame(fromJSON(getURL(URLencode("129.152.144.84:5001/rest/native/? ..." ... [TRUNCATED]
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > png("/Users/pcannata/Mine/UT/GitRepositories/DataVisualization/RWorkshop/00 Doc/categoricals2.png",
## + width = 25, height = 20, units = "in", re .... [TRUNCATED]
##
## > grid.newpage()
##
## > pushViewport(viewport(layout = grid.layout(2, 12)))
##
## > print(l[[1]], vp = viewport(layout.pos.row = 1, layout.pos.col = 1:3))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[2]], vp = viewport(layout.pos.row = 1, layout.pos.col = 4:6))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[3]], vp = viewport(layout.pos.row = 1, layout.pos.col = 7:9))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[4]], vp = viewport(layout.pos.row = 1, layout.pos.col = 10:12))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[5]], vp = viewport(layout.pos.row = 2, layout.pos.col = 1:3))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[6]], vp = viewport(layout.pos.row = 2, layout.pos.col = 4:6))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[7]], vp = viewport(layout.pos.row = 2, layout.pos.col = 7:9))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > print(l[[8]], vp = viewport(layout.pos.row = 2, layout.pos.col = 10:12))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
##
## > dev.off()
## pdf
## 2